A New Framework for Distributed Multivariate Feature Selection
Authors
Abstract:
Feature selection is considered as an important issue in classification domain. Selecting a good feature through maximum relevance criterion to class label and minimum redundancy among features affect improving the classification accuracy. However, most current feature selection algorithms just work with the centralized methods. In this paper, we suggest a distributed version of the mRMR feature selection approach. In mRMR, feature selection is performed based on maximum relevance to class and minimum redundancy among the features. The suggested method include six stages: in the first stage, after determining training and test data, training data are distributed horizontally. All subsets have same number of features. In the second stage, each subset of features is scored using mRMR feature selection. Features with higher ranks are selected and others are eliminated. In the fourth stage, features which were omitted are voted. In the fifth stage, the selected features are merged to determine the final set. In the final stage, classification accuracy is evaluated using final training data and test data. Our method quality has been evaluated by six datasets. The results prove that the suggested method can improve classification accuracy compared to methods just based on maximum relevance to class label in addition to runtime reduction.
similar resources
A New Hybrid Framework for Filter based Feature Selection using Information Gain and Symmetric Uncertainty (TECHNICAL NOTE)
Feature selection is a pre-processing technique used for eliminating the irrelevant and redundant features which results in enhancing the performance of the classifiers. When a dataset contains more irrelevant and redundant features, it fails to increase the accuracy and also reduces the performance of the classifiers. To avoid them, this paper presents a new hybrid feature selection method usi...
full textRandom subspace method for multivariate feature selection
In a growing number of domains the data collected has a large number of features. This poses a challenge to classical pattern recognition techniques, since the number of samples often is still limited with respect to the feature size. Classical pattern recognition methods suffer from the small sample size, and robust classification techniques are needed. In order to reduce the dimensionality of...
full textA distributed wrapper approach for feature selection
In recent years, distributed learning has been the focus of much attention due to the proliferation of big databases, usually distributed. In this context, machine learning can take advantage of feature selection methods to deal with these datasets of high dimensionality. However, the great majority of current feature selection algorithms are designed for centralized learning. To confront the p...
full textA framework for feature selection in clustering.
We consider the problem of clustering observations using a potentially large set of features. One might expect that the true underlying clusters present in the data differ only with respect to a small fraction of the features, and will be missed if one clusters the observations using the full set of features. We propose a novel framework for sparse clustering, in which one clusters the observat...
full textA Comparison of Multivariate Mutual Information Estimators for Feature Selection
Mutual Information estimation is an important task for many data mining and machine learning applications. In particular, many feature selection algorithms make use of the mutual information criterion and could thus benefit greatly from a reliable way to estimate this criterion. More precisely, the multivariate mutual information (computed between multivariate random variables) can naturally be...
full textA Supervised Feature Subset Selection Technique for Multivariate Time Series
Feature subset selection (FSS) is a known technique to pre-process the data before performing any data mining tasks, e.g., classification and clustering. FSS provides both cost-effective predictors and a better understanding of the underlying process that generated data. We propose Corona, a simple yet effective supervised feature subset selection technique for Multivariate Time Series (MTS). T...
full textMy Resources
Journal title
volume 19 issue 4
pages 15- 36
publication date 2023-03
By following a journal you will be notified via email when a new issue of this journal is published.
No Keywords
Hosted on Doprax cloud platform doprax.com
copyright © 2015-2023